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Abstract 

We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of 
zero-sum stochastic game. Under mild assumptions, we prove that the value of the A-discounted 
game v\ converges uniformly when A goes to 0 if and only if the value of the n-stage game v n 
converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer 
and Sorin [6] to the two-player zero-sum case. We also provide the first example of a stochastic 
game with public signals on the state and perfect observation of actions, with finite state space, 
signal sets and action sets, in which for some initial state k\ known by both players, ('Uy(fci)) 
and (v n (ki)) converge to distinct limits. 


Introduction 

Zero-sum stochastic games were introduced by Shapley [23]. In this model, two players repeat¬ 
edly play a zero-sum game, which depends on the state of nature. At each stage, a new state of 
nature is drawn from a distribution based on the actions of players and the state of the previous 
stage. The state of nature is announced to both players, along with the actions of the previous 
stage. There are several ways to evaluate the payoff in a stochastic game. For n E N*, the payoff in 
the n — stage game is the Cesaro mean n~ l Ylm =l 9m, where g m is the payoff at stage m > 1. For 
A € (0,1], the payoff in the A — discounted game is the Abel mean Ylm >l ^(1 — ^) m ~ 1 9m ■ Under 
mild assumptions, the n-stage game and the A-discounted game have a value, denoted respectively 
by v n and v\ (see Maitra and Parthasarathy [8] and Nowak [13]). 

A huge part of the literature focuses on the existence of the limit of v n when n goes to infinity, 
and of the limit of v\ when A goes to 0. Bewley and Kohlberg [1] proved that ( v n ) and (v\) 
converge to the same limit, when the state space and action sets are finite. For Markov Decision 
Processes, this result extends to the case of compact state space, infinite action set and 1-Lipschitz 
transition (see Rosenberg, Solan and Vieille [20] and Renault [16]). For absorbing games, this 
result extends to the case of infinite state space, compact action sets and continuous payoff and 
transition functions (see Mertens, Neyrnan and Rosenberg [9]). Vigeral [26] provided an example 
of a stochastic game with finite state space and compact action sets in which neither ( v n ) nor (uy) 
converges. A natural question is whether the convergence of ( v n ) implies the convergence of (uy), 
and conversely. When (uy) is absolutely continuous with respect to A, Neyrnan [24, Appendix C, 
p.177] proved that ( v n ) converges to the limit of (uy). In the dynamic programming framework, 
Lehrer and Sorin [6] proved that ( v n ) converges uniformly (with respect to the initial state) if and 
only if (uy) converges uniformly, and that when uniform convergence holds, the two limits coincide 
*. This result does not hold when uniform convergence is replaced by pointwise convergence (see 
Sorin [24, Chapter 1, p. 9-10]). In the two-player case, Li and Venel [7] proved that for recursive 
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games (which are stochastic games where the payoff is 0 in nonabsorbing states), (v n ) converges 
uniformly if and only if (y\) converges uniformly, and that when uniform convergence holds the two 
limits are equal. The generalization of this result to stochastic games was open. 

Mertens, Sorin and Zamir [10, Chapter IV] have introduced a general model of stochastic game 
with signals, in which players neither observe the state nor the action of their opponent, but instead 
observe at every stage a signal correlated to the current state and the actions which have just 
been played (state space, action and signal sets are assumed to be finite). Ziliotto [27] provided an 
example of a stochastic game with public signals on the state and perfect observation of actions, 
such that (v n ) and (v\) fail to converge (for special classes of stochastic games with signals in which 
(v n ) and (v\) converge to the same limit, see [2, 11, 15, 17, 19, 20, 21, 25]). The question of the 
relation between the convergence of (y n ) and (v\) was also open. By Mertens, Sorin and Zamir [10, 
Chapter III], one can associate to any stochastic game with signals an auxiliary stochastic game 
with perfect observation of the state and actions, which has the same n-stage and A-discounted 
values. The state space of this auxiliary game is infinite and compact metric, and is the set of 
infinite higher-order beliefs of players about the state. That is why in this paper we study first 
stochastic games, then apply our results to stochastic games with signals. 

The contribution of this paper is twofold. First, it generalizes both the result of Lehrer and 
Sorin [6] and Li and Venel [7] to stochastic games. We consider any stochastic game (with possibly 
infinite set space and action sets) in which for all n € N* and A € (0,1], ( v n ) and (v\) exist and 
satisfy Shapley equations, and prove the following Tauberian theorem: (v n ) converges uniformly 
if and only if (v\) converges uniformly, and when uniform convergence holds the two limits are 
equal. This theorem applies to many standard models in the literature: dynamic programming, 
stochastic games with finite state space and compact action sets, stochastic games with signals, 
hidden stochastic games, and Markov chain games. The proof of our result relies on the operator 
approach, introduced by Rosenberg and Sorin [22]. This approach relies on the fact that the values 
of the n-stage game and the A-discounted game satisfy a functional equation, called the Shapley 
equation (see Shapley [23]). The properties of the associated nonexpansive operator can be exploited 
to infer convergence properties of (v n ) and (v\) (see Rosenberg and Sorin [22]). Thus, we start by 
proving a Tauberian theorem for nonexpansive operators, and then apply it to stochastic games. 

Second, this paper provides the first example of a stochastic game with public signals on the 
state and perfect observation of the actions (hidden stochastic game), with finite state space, signal 
sets and action sets, in which for some initial state k\ known by both players, (v\{k\)) and (■ v n (k \)) 
converge to distinct limits (note that in the example in Sorin [24, Chapter 1, p. 9-10], the state space 
is infinite and not compact). An example of a stochastic game with finite state space, compact action 
sets, perfect observation of the state and actions and having the same property can be deduced from 
this example. Thus, our example shows that as soon as the state is imperfectly observed, or the 
state space is not finite, or the action sets are not finite, there is no link between the convergence 
of (v\(ki)) and {v n (k \)), where k\ is some initial state. 

The paper is organized as follows. In the first section, a Tauberian theorem for nonexpansive 
operators is stated and proved. In the second section, a Tauberian theorem for stochastic games 
is deduced from the first section. In the third section, particular cases of stochastic games are 
considered. The fourth section presents the aforementioned example. 

1 Nonexpansive operators 

Let ( X , ||. ||) be a Banach space, and \k : X —>• X be a nonexpansive mapping, that is: 

V (/,«?) ex 2 ||^(/)-T( 5 )||<||/- 5 ||. 

By a standard fixed point argument (see Sorin [24, Appendix C]), there exists a bounded family 
(ua)ag(o,i] such that for all A € (0,1] 

v\ = A\h((l — A)A -1 ua). (1.1) 
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For n E N*, define 


( 1 . 2 ) 


v n := n ^"(O), 

where T” is the n-th iterate of 'h. Because 'h is nonexpansive, (u n ) n >i is bounded. 

Kohlberg and Neyman [5] provided conditions under which lim n _ ) .+ 00 v n and lim,\->.o v \ exist. In 
this section, we investigate the link between the existence of linin-^oo v n and limA-^oUv We make 
the following assumption: 

Assumption 1. There exists C > 0 such that for all A, X' E (0,1], / E X, 

||AT(A" 1 /) - A , T(A /_1 /)|| < C |A - A'| . 

Remark 1.1. An important class of operators which satisfy Assumption 1 is the following. Let 
K be any set, and X be the set of bounded real-valued functions defined on K , equipped with the 
uniform norm. Consider two sets S and T, and a family of linear forms ( Pk,s,t)(k,s,t)eKxSxT on W, 
such that for all ( k,s,t ), Pk s ,t is of norm smaller than 1. Let g : K x S xT K be a bounded 
function. Define T : X —>• X by '&(/)(£;) := sup sgS inf teT {s(k, s,t) + Pk,s,t(f)}, for all / E X and 
k E K. This class includes Shapley operators (see Neyman [12, p.397-415]): this corresponds to 
the case where K is the state space of some zero-sum stochastic game, S (resp. T ) is the set of 
mixed actions of Player 1 (resp. 2), k is the current state, and Pk,s,t(f ) is the expectation of f(k') 
under mixed actions s and t , where k' is the state at next stage. Under suitable assumptions, for 
all n E N* and A E (0,1], v n and v\ are respectively the value of the n-stage game and the value of 
the A-discounted game. This point will be useful in Sections 2 and 3. 

We now state a Tauberian theorem for nonexpansive operators satisfying Assumption 1. 

Theorem 1.2. Under Assumption 1, the two following statements are equivalent: 

(a) The sequence (u n )n>i converges when n goes to infinity. 

(b) The mapping A —>• v\ has a limit when A goes to 0. 

Moreover, when these statements hold, we have lim n _>. +00 v n = limA^o^A- 

The remainder of this section is dedicated to the proof of the theorem. 

Definition 1.3. Let A E (0,1] and n E N*. The operator 'IC : X —>• X is defined recursively by 
*«(/) := / for all / E X, and for n > 1: 

V / E X n(f) ■= A*((l - A)A -1 ^- 1 (/)). 


Note that equation (1.1) writes 


v\ = *i(«A). 


Lemma 1.4. Let f,g&X, A E (0,1], n E N* and t E {1,2, ..., n}. Then 


(i) 


« r i(/)-« r i07)ll<(i-A)*ii/-ffii 


(ii) 


K-i(f) ~ - t)/)|| < (C + [[/II) [tn- 1 - 1 + (1 - n" 1 )*] . 


Proof. Proof 

(i) This follows from the nonexpansiveness of T. 
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(ii) We have 




t)/)|| < ||(l-n- 1 )^-_ 1 1 (/)-n- 1 ^- 1 ((n-t)/)|| 

< C [n~ l - (1 -vr l )vT l ] + ||(1 - n~ l ) 2 ¥~3,U) -n _1 ^ 2 ((n-t)/)| 

t 

< C^2 ( n_1 - (1 - n -1 )” 1-1 ?!^ 1 ) + || (1 - n -1 )*/ - n -1 (n - t)/|| 

ra=l 

— C [tn~ l - 1 + (1 - n~ l f] + [(1 - n -1 )* - n -1 (n - f)] ||/|| 

= (C+ ll/ll) [(n-'-l + fl-n- 1 )*]. 


The first inequality stems from the nonexpansiveness of T. In the second inequality, we 
applied Assumption 1 for A = (1 — n~ 1 )n~ 1 , A 7 = n~ l and / = (1 — ro~ 1 ) 2 'I' 7 C^(/), and used 
the nonexpansiveness of T. Applying successively Assumption 1 for A = (1 — n~ 1 ) m ~ 1 n ~ 1 , 
X' = n~ l and / = (1 — n _1 ) m \E' t T ? T(/) (jn € {1, ...,t}) together with the nonexpansiveness of 
T yields the third inequality. 


□ 


We now prove that (a) implies (6). 


(a) =► (6) 


Assume (a). Let (A, A 7 ) € (0, l] 2 . We have 

IK - v\' || = A*((1 - A)A" V) - A'T((1 - AOA'- 1 ^) 


< C |A — A'| + 


A'tf ((1 - A)A 7 _ V) - A'T((1 - A 7 )A 7_ V) 


/\ 1 


< (C + ||t; A ||) |A — A 7 | + (1 — A 7 ) |K — v y \ 


In the first inequality, we applied Assumption 1 to / = (1 — X)v\, and in the second inequality, 
we applied twice the nonexpansiveness of ’L. We deduce the existence of A > 0 such that for all 
(A, A 7 ) € (0, l] 2 , ||v A — u A /|| < A|A — A 7 | A 7-1 . Consequently, in order to prove (6), it is sufficient to 
prove that (v n -i) n >i converges when n goes to infinity. 

By (a), there exists v* € X such that {y n ) n >\ converges to v*. Let e € (0,1/4). Let N 0 € N* 
such that for all n > No, 

K-u*|| <e 2 /2. (1.3) 

Let n > e~ 2 No, X := n _1 , and t := \_en\, where denotes the integer part of x. Equations (1.1) 
and (1.2) yield 


v \ = *1K), 

and 

v n = - t)v n - t ). 

We have 

II^A || < ||u A - ^ A (w n -t)|| + ~V n \\ ■ 

Applying first (1.4) and Lemma 1.4 (i), then (1.3), we obtain 

\\v\ - T f A (u n _ f )|| < (1 — XY ||v A — v n -t\\ 

(1 A) ||u A ^n\\ T (1 A) \\v n Vn—t 
< (1 - xy \\v x - V n \\ + e 2 . 


(1.4) 

(1.5) 
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Let M := C + sup ||v n ||. Equality (1.5) and Lemma 1.4 (ii) yield 

n£ N 


||4' f A (u n - f ) - v n \\ < (C + \\v n - t \\) [tn 1 — 1 + (1 — n 1 ) t ] 
< M (e - 1 + e~ e+e2 ) . 


The last two inequalities yield 

IK-fnll < (1 - A) en_1 \\v\ - v n \\ +e 2 + M (e- 1 + e~ e+(2 ^j 

< e~ e+(2 ||u A - v n || + e 2 + M (e - 1 + e" e+e2 ) . 

We deduce that 


\\v\ - v n \\ < 


e 2 + M e - 1 + e 


-e+€ J 


1 — e 


-e+e' 


-1 


The right-hand side goes to 0 when e goes to 0, thus (6) holds. 


( b ) =► (a) 

Assume (6). There exists fi € (0,1) such that for all A € (0,/?], we have 

\\vx-v*\\<e 2 /2. (1.6) 

Let eo G (0,1) such that for all e < eo, e _e < 1 — e + e 2 . Fix e G (0, eo/2], and define ro := [e ~ 3 / 2 J. 

Let IV > 1 such that [(1 — e) r ° -1 JVj > (fie) -1 . Let n > N. For r G N*, define n r := [(1 — e) r_1 nj 

and X r := l/n r . The following assertions hold: 

Lemma 1.5. 

(0 v r G {1,..., r 0 } X r < fie 

(ii) V r G {1,..., r 0 — 1} (1 — l/n r ) nr ~ nr+1 — n r+ i/n r < 4e 2 
(in) V r G {1,..., r 0 - 1} ||w Ar - u Ar+1 1| < e 2 
Proof. Proof 

(i) Let r G {1, ...,ro}. We have [(1 — e) r ~ 1 nj > (fie) -1 , thus A r < fie. 


(ii) Let r G {1, ...,ro — 1}. We have 

_ \/ n 1 g { n r n r+l)/ n r 


< 1 — (n r — n r+ \)/n r + [{n r — n r+ i)/n r ] 2 

< n r+ i/n r + (1 — n r+ i/n r ) 2 

< n r+ i/n r + (e + l/n r ) 2 

< n r+ i/n r + 4e 2 . 


(iii) It is a direct consequence of (1.6) and (i). 


□ 


Let r G {1, ...,r q — 1}. Equations (1.1) and (1.2) yield 

v nr = (n r ) -1 ^ nr-nr+1 (n r+ iv Ur+1 ) 


(1.7) 
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and 


We have 


v\ r = V 


_ ,T f n r n r-\-l 


(v\ r ). 


( 1 . 8 ) 


IK, - v \ r II < | \v nr - K) 1 *" r nr+1 {n T+1 v Xr )\\ + || (n r ) 1 *" r Ur+1 (n r+ \v Xr ) - v Xr \ 
Applying first (1.7) and then Lemma 1.5 (iii) yields 


|| v nr -(n r ) 'L" r " r+1 K+m,)|| < (n r ) nr+i\\v nr+1 - v Xr\\ 

< (n r )~ 1 n r+ 1 (||iv +1 - w Ar . +1 1| + e 2 ) . 

Let M := C + sup ragN » |KII + su Pag(o,i] IKII + 1- Equality (1.8) and Le m ma 1.4 (ii) yield 

||(n r ) _1 4' rtr ” nr+1 (n r+ iUA,) - ^a,|| < {C + |K,||) [(n r - n r+ i)/n r - 1 + (1 - 1 /n r ) nr ~ nr+1 ] 

< M [(1 - 1 /n r ) nr ~ nr+1 - n r+ i/n r ] 

< 4 e 2 M. 


We deduce that 


n-r \\vn r — v\ r || < n r+1 (\\v nr+1 -v Xr+1 \\+e 2 )+4e 2 Mn r 

< n r+ 1 ||fn, +1 - V\ T+1 1| + 5 e 2 Mn r . 


Summing from r = ltor = rQ — 1 yields 


K-^n-ill ^ n ro n 


-i 


v n rQ V X rQ 


+ 5e M ro 
< 2(1 - e)L £_3/2 J- 1 M + 5 e 1/2 M. 


The right-hand side goes to 0 as e goes to 0, thus (a) holds. 

Note that the proof of the two implications show that when (a) and (6) hold, we have lim Tl _ > ._|_ 00 v n 
lim A -40 v x - 


2 Applications to zero-sum stochastic games 

In this section, we apply Theorem 1.2 to zero-sum stochastic games. 

2.1 Dynamic programming 

A dynamic programming problem is described by a state space K, a nonvoid correspondence 
F : K K, and a bounded payoff function g : K —>• M. 

The problem proceeds as follows. Given an initial state k\ G K, at each stage, the decision¬ 
maker chooses k rn+ \ G F(k m ), and gets the stage payoff g{k m ). For A G (0,1] (resp. n G N*) in 
the A-discounted problem (resp. ?r-stage problem), the decision-maker maximizes the total payoff 

Em>l A (! “ A ) m_1 5(K) (resp. n~ l Y,m=i9( k m))- 

A strategy for the decision-maker assigns a decision k m+ \ G F(k m ) to each finite history 
(ki, ^ 2 , k m ). We denote respectively v x (£q) and v n (k\ ) the value of the A-discounted problem and 
n-stage problem: v x {h) : = sup sgS E m >i H 1 ~ A ) m_1 £f(K) and v n {k{) := sup sg5 n~ l YZi =l 9{ k m), 
where S is the set of strategies. 

Let X be the set of bounded real-valued functions defined on K, equipped with the uniform 
norm. For (/, k) € X x K, let 


K/Xfc) : = 0(fc) + sup f{k'). 

k'eF(k) 
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X is a Banach space and \L is a nonexpansive operator which satisfies Assumption 1. Standard 
dynamic programming gives (see Lehrer and Sorin [6]): 

v\(k) = A g{k) + (1 - A) sup v x {k') = [AT((1 - A)A _1 u a )] (k) 

k'eF(k) 

and 

v n (k) = n~ l g{k) + (1 - n _1 ) sup v n -i(k') = [n _1 ^((ra - l)v n _i)] (k). 

k'€F(k) 

Applying Theorem 1.2, we recover the Tauberian theorem proved in Lehrer and Sorin [6]: (v n ) con¬ 
verges uniformly on K if and only if (y\) converges uniformly on K, and when uniform convergence 
holds, the two limits coincide. 


2.2 Zero-sum stochastic games 

If (C, If) is a Borel subset of a Polish space, we denote by A(C) the set of probability measures 
on C, equipped with the weak* topology. 

We use the same framework as in Maitra and Parthasarathy [8] . We consider a general model of 
zero-sum stochastic game, described by a state space K which is a Borel subset of a Polish space, 
two action sets I and J, which are Borel subsets of a Polish space, a Borel measurable transition 
function q : K x I x J —>• A (K), and a bounded Borel measurable payoff function g : K x I x J — > M. 

The initial state is k\ E K, and the stochastic game T(fci) which starts in k\ proceeds as follows. 
At each stage m > 1, both players choose simultaneously and independently an action, i rn E / 
(resp. j m E J) for Player 1 (resp. 2). The payoff at stage m is g m := g(k m ,i m ,j m ). The state 
k m+ i of stage m + 1 is drawn from the probability distribution q{k m ,i m , j m ). Then (k m+ i,i m , j m ) 
is publicly announced to both players. 

The set of all possible histories before stage m is H m := ( I\ xlx J) m ~ l x K. A behavioral strategy for 
Player 1 (resp. 2) is a Borel measurable mapping a : U m >i H m —>• A(I) (resp. t : U m >i H m —» A( J)). 
A triple (£q, a, r) € K x X x £? induces a probability measure on := (K xlx J) N *, denoted 
by Let A E (0,1]. The A-discounted game r A (&i) is the game dehned by its normal form 

(£, where 

7 fcl A(a,r) := I ^ A(1 - A)— 1 ^ 

ym>l 

Let n E N*. The n-stage game r n (Aq) is the game defined by its normal form (S, fX. ), where 

/ n 

'ln{°x) ■= 9m 

\ n rn= 1 




Let / : K —>• R be a bounded Borel measurable function, and ( k\,x,y) E K x A (I) x A (J). Dehne 

E x, y (f) : = / f(k')dq(k, i, j)(k')dx(i)dy(j) 

J(k',i,j)eKxIxJ 


and 

g(k,x,y) := / g(k,i,j)dx(i)dy(j). 

We make the following assumption: 

Assumption 2. For all k\ E K, A E (0,1] and n E N*, the games r A (fci) and r ra (^i) have a value, 
that is, there exists real numbers v\(k\) and v n (k\) such that: 


v\(ki) = sup inf 7 a’ 1 (<t, r) 


inf sup7^(cr,r), 
r&sr CTeS 
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and 


v n {k i) = sup inf 7 ^(cr,r) = inf sup 7* 1 (a, r). 

CTeS re ^ t&st CTgS 

Moreover, v\ and v n are Borel measurable, and satisfy the following Shapley equations: 


and 


v\(h) 


sup inf 

xeA(7) 1/GA( J) 


\^g{ki,x,y) + (1 - A)E£> a )} 


inf sup 

yGA(J) xeA (/) 


{A g(ki,x,y) + (1 - A)E^(u a )| 


v n (k 1 ) 


sup inf In 1 g{k 1 , x, y) + (1 

xGA(/)2/GA(J) 1 . 

inf sup \ n~ 1 g{k\, x, y) + (1 
y(HA(J) xgA(/) ^ 


-»» W^K-i)} 


Let X be the set of bounded Borel measurable functions from K to M, equipped with the uniform 
norm, and for all (f,k) <E X x K, we define ^{f)(k) := sup xgA(7 ) inf. ygA(J) {#(£;, x,y) + E* y (/)}. 
We make the following assumption: 

Assumption 3. For all / € X , 'L(Z) is Borel measurable. 

Remark 2.1. When A', / and J are compact metric spaces and q and g are jointly continuous, 
Assumptions 2 and 3 hold. Maitra and Parthasarathy [ 8 ] and Nowak [13] provided weaker conditions 
under which Assumptions 2 and 3 hold. 

Theorem 1.2 yields the following Tauberian theorem for stochastic games: 

Theorem 2.2. Under Assumptions 2 and 3, the two following statements are equivalent: 

(a) The family of functions (v n ) n >\ converges uniformly on K when n goes to infinity. 

(b) The family of functions (u A ) Ag ( 0i i] converges uniformly on K when A goes to 0. 

Moreover, when these statements hold, we have lim n _^ +oc v n = lim A ^ 0 v \ ■ 


Proof. Proof X is a Banach space, and Assumption 3 ensures that 'L is well defined from X to X. 
Moreover, T is a nonexpansive operator which satisfies Assumption 1. Thus Theorem 1.2 applies 
to T. By Assumption 2, the families of values (v\) and (v n ) satisfy equations (1.1) and (1.2), and 
the result is proved. □ 


2.3 Stochastic games with signals 

Assume K, I and J to be finite. The previous model can be generalized in the following way. 
Let A (resp. B) be a finite set of signals for Player 1 (resp. 2). Instead of observing the past actions 
( im,jm ) and the future state k m + 1 at the end of each stage rn, Player 1 (resp. 2) gets a private 
signal a m (resp. b m ), which is correlated to ( k m ,i m ,j m ) (see Mertens, Sorin and Zamir [10, Chapter 
IV] for more details). This defines a stochastic game with signals, denoted by T. A behavioral 
strategy for a player assigns a mixed action to each of his private history (that is, all the actions he 
has played and all the signals he has received before the current stage). Because K, I, J, A and B 
are finite, the A-discounted game and the n-stage game have a value for all A G (0,1] and n € N*. 

By Mertens, Sorin and Zamir [10, Chapter III], there exists a stochastic game T (in the sense of 
the previous subsection) with perfect observation of the state and actions, which is equivalent to L: 
it has the same A-discounted and n-stage values. The state space of this auxiliary stochastic game 
is the set of infinite higher-order beliefs of the players about the state: this is the universal belief 
space, denoted by B. The set B is compact metric, and Assumptions 2 and 3 are satisfied. Thus 
Theorem 2.2 applies to the auxiliary stochastic game T. 
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Proposition 2.3. The two following statements are equivalent: 

(a) The family of functions (v n )n>i converges uniformly on B when n goes to infinity. 

(b) The family of functions (ua)ag(o,i] converges uniformly on B when A goes to 0. 

Moreover, when these statements hold, we have lim„_ 5 . +0O v n = lim^o v \ ■ 

Remark 2.4. It is not known in general if the families ( v n ) n >i and (ua)ag(o,i] are equicontinuous. 
Thus uniform convergence may be difficult to prove, even when pointwise convergence holds. In the 
examples of the next section, (■ v n ) n >\ and (vx)ag(o,i] are equi-Lipschitz, thus pointwise convergence 
and uniform convergence are equivalent in these examples. 

3 Examples of zero-sum stochastic games 

We apply the results of the previous section to several standard examples of zero-sum stochastic 
games. 

3.1 Stochastic games with compact action sets and finite state space 

We consider the case where the state space is finite, the action sets are compact, and the transi¬ 
tion and payoff functions are separately continuous. By the standard minmax theorem, Assumptions 
2 and 3 hold (it is a particular case of Maitra and Parthasarathy [8]). Because K is finite, uniform 
convergence and pointwise convergence with respect to the state variable are equivalent. Theorem 

2.2 yields the following proposition: 

Proposition 3.1. In a stochastic game with finite state space and compact action sets, the two 
following statements are equivalent: 

(a) For all k\ € K, ( v n {k \)) converges when n goes to infinity. 

(b) For all k\ E K, ( v\{k \)) converges when A goes to 0. 

Moreover, when these statements hold, we have lim n _> +00 v n (ki) = lim^o ^a(^i) for all k\ E K. 

3.2 Hidden stochastic games 

Consider the following example of stochastic game with signals. Assume that K , I and J are 
finite, and that players do not observe the current state at each stage (they observe past actions). 
Instead, they receive a public signal about it, lying in some finite set A (see Renault and Ziliotto [18] 
for more details). In this particular case, the universal belief space is B = A (K): this corresponds 
to the common belief of the players about the state (see Ziliotto [27]). Thus, (v\) and (v n ) can 
be considered as families of maps from A(A') to R. They are both equi-Lipschitz, thus for theses 
families, pointwise convergence and uniform convergence are equivalent. By Proposition 2.3, the 
following result holds. 

Proposition 3.2. In a hidden stochastic game, the two following statements are equivalent: 

(a) For all pi E A (K), (v n (pi)) converges when n goes to infinity. 

(b) For all pi E A(AT), (ua(pi)) converges when A goes to 0. 

Moreover, when these statements hold, we have lim n ^. +00 v n (pi) = lim^o v\{pi) for allp\ E A (K). 
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3.3 Markov chain games with incomplete information on both sides 

Consider the following example of stochastic game with signals. Assume that K, I and J are 
finite, and that the state space is a product I\ = C x D, such that the two components of the 
state follow independent Markov chains. Players know the transition and the initial distribution 
of each Markov chain, but only Player 1 (resp. 2) observes the realization at stage 1 of the first 
(resp. second) component. Prom stage 2, they do not observe the state. They observe past actions 
(see Gensbittel and Renault [3] for more details). In this particular case, the equivalent stochastic 
game with perfect observation of the state has state space A (C) x A (D), that is, the product of 
the set of possible beliefs of Player 2 about the initial state of the first Markov chain, and of the set 
of possible beliefs of Player 1 about the initial state of the second Markov chain. Thus, (v\) and 
(v n ) can be considered as families of maps from A(C) x A (D) to M. They are both equi-Lipschitz, 
thus for these families, pointwise convergence and uniform convergence are equivalent. Gensbittel 
and Renault [3] proved that (v n ) converges, and asked whether (v\) converges. By Remark 2.1, 
Assumptions 2 and 3 hold, and from Theorem 2.2 we deduce the following result: 

Proposition 3.3. In a Markov chain game with incomplete information on both sides, for all 
pi E A(C) x A (D), (v n (pi)) and (v\(pi)) converge to the same limit. 

4 An example 

In this section, we prove the following theorem: 

Theorem 4.1. There exists a hidden stochastic game such that for some initial state k\ € K known 
by both players, (v\(k\)) and (■ v n {k \)) converge to distinct limits. 

Remark 4.2. As proved in Ziliotto [27, Section 4, p. 21], this hidden stochastic game can be 
adapted, in order to get an example of a stochastic game with compact action sets and finite state 
space, such that for some initial state k\ E K, (v\(ki)) and ( v n {k\ )) converge to distinct limits. It is 
also possible to build an example of a hidden stochastic game such that for some initial state k\ E K 
known by both players, (v\(ki)) converges but (v n (ki)) does not, and conversely, an example where 
{v n {ki)) converges but (v\(ki)) does not. 

Before going to the proof, we provide some piece of intuition. In Ziliotto [27], a hidden stochastic 
game T is constructed, in which neither (v\(ki)) nor (v n (k\)) converges, where k\ is an initial state 
known by both players. In the discounted game, there exists optimal stationary strategies (that 
is, strategies which only depend on the common belief about the current state). In this example, 
a stationary strategy for Player 1 (resp. 2) is equivalent to the choice of an integer a E rN (resp. 
b E 2rN). Apart from the fact that Player 2’s set of stationary strategies is smaller, the game is 
symmetric. In Ta(/ci), the optimal choice for both players is to choose m as close as possible to 
— ln(\/2A)/ln(2). For some A, the closest integer lies in r(2N + 1), and Player 1 has an advantage, 
whereas for some other discount factors, it lies in 2rN, and Player 1 has no advantage. This 
is why (v\(ki)) oscillates (between 1/2 and some l E (1/2,1]). In T n (ki), there may not exist 
optimal stationary strategies. Depending on the stage of the game m E {1, ...n}, the optimal 
integer for Player 1 lies in 2rN, or in r(2N + 1). Thus, according to the stage of the game, Player 
1 may or may not have an advantage. This is in sharp contrast to the discounted game r>(fci), 
in which either Player 1 always has an advantage at any stage of the game, or it never has one. 
That is why we believe that in this example, lim inf n _>. +00 v n {k\) > 1/2 (but we were not able to 
prove it). Nonetheless, one can construct a hidden stochastic game T 1 , very similar to T, in which 
lim inf„_ 5 ._|_ 0O v\{k\) >1/2 (this corresponds to Step 1 of the proof). In Step 2, we construct a hidden 
stochastic game T 2 , which only difference with T 1 is that Player 2’s set of stationary strategies is 
equivalent to r(2N + 1), instead of 2rN. In T 2 , we also have liminf n _>. +00 u 2 (/c 2 ) > 1/2, where is 
an initial state known by both players. In Step 3, we define the hidden stochastic game T 3 , in which 
starting from an initial state ui^, Player 2 chooses between playing T 1 (A’i) or T 2 ^). In T 3 ^), 
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Player 1 does not have an advantage, because the optimal integer is the same at any stage of the 
game (either it always lies in 2rN, or it always lies in r(2N + 1)). Thus, lim^o u?(u; 3 ) = 1/2, but 
liminf n _» +00 v^uj^) > 1/2. It is then straightforward to construct in Step 4 a final example P 1 
which proves the theorem. 


Step 1 The game T 1 . 


In Ziliotto [27, Section 3, p. 12-19], for some r > 2, a hidden stochastic game T with 3r + 4 states, 
two signals {D, D'} and two actions {C, Q} for each player is constructed, such that for some initial 
state k\ = 1 ++ G K known by both players, and for all A € (0,1], the following properties hold: 


Properties of P\(fci) 


(1) The game P\(A;i) (that is, the A-discounted game starting from state ki, and players know 
the initial state) has the same value as the one-shot game G a, with action set rN for Player 
1, 2rN for Player 2, and payoff function 


where 


9 x(a,b) 


1 ~ fx(b) 

1 - fx(a)fx(b )’ 


fx(n) 


(1 - 2 -n )(l - A 2 ) 

1 + 2" +1 A(l - A)~ n - A 


e [o,i). 


(4.1) 


( 2 ) vxih) > 1/2 

(3) For m € N*, define n m := 2 4rm+2r+1 . Then 

liminf v nm (ki) > 3/4. 

m—>•+oo 

(4) Consider the one-shot game with action set rN for Player 1 and 2, and payoff function g\. 
The value of this game converges to 1/2 when A goes to 0. 

Property ((1)) corresponds to [27, Proposition 3.3, p. 15]. Property ((3)) follows from the proof of 
[27, Theorem 3.6, p. 18-19]. In G\, a dominant strategy for Player 1 (resp. 2) is to maximize f\ 
over rN (resp. 2rN). Consequently, if the action set of Player 2 is changed into rN, the value of 
this new game is equal to [1 + max aer N _/\(a )] -1 . This quantity is greater than 1/2, thus Property 
((2)) holds. By [27, Lemma 2.4, p. 10], this quantity goes to 1/2 as A goes to 0, thus Property ((4)) 
holds. 

Define another hidden stochastic game T 1 , by adding a new state uj\ to T (the action and signal 
sets are unchanged). The game is described by the following figure: 

Figure 1: The game T 1 
C 

A o 

PI (uA ---► F(fci) 


The payoff in state oj± is 1/2, and Player 1 controls (that is, the transition in uq is independent 
of Player 2’s actions). When Player 1 plays action C in state uq, the game remains in state uq with 
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probability 1, and when Player 1 plays action Q, the game goes to state k\ with probability 1, and 
the game r(Aq) is played. 

We have the following proposition: 

Proposition 4.3. 

(i) The game r/(u;i) has the same value as the one-shot game with action set rN for Player 
1, 2rN for Player 2, and payoff function g\(a, b ) := A/2 + (1 — A )g\(a, b). 

(ii) The sequence of values ( v *) ofT 4 satisfies 

lirninf v\{uj \) >1/2. 

n —>-+oo 


Proof. 

(i) It follows from ((2)) that an optimal strategy for Player 1 in r^uq) is to play Q at stage 1, 
then an optimal strategy in r^(/ci). which proves the result. 


(ii) Let e > 0. By ((3)), there exists mo E N* such that for all m > mo, v nm {k{) > 2/3. Let 
n > n mo , and let m > mo such that n rn <n< n m +i- Define the following strategy for Player 
1 in r/(u;i): play C (thus stay in uq) until stage n — n m . play Q at stage n — n m , then from 
stage n — n m + 1, play an optimal strategy in r nm (&i). This strategy guarantees 


1 

n 


- (n - n m ) + -n m 


1 1 n m 1 2 

> - + —- > - + — 
“2 6 n ~ 2 6 


—4 r 


thus lim inf n _j._ ) _ 00 v n {u\) > 1/2. 


□ 


Step 2 The game T 1 2 3 . 

One can construct a hidden stochastic game T similar to the example in [27, Section 3, p.12-19], 
with 3r + 4 states, two signals {D, D'} and two actions {C, Q} for each player, such that for some 
initial state k\ E K known by both players, the following properties hold: 

(1) The game T\(ki) has the same value as the one-shot game G\, with action set rN for Player 
1, r(2N + 1) for Player 2, and payoff function g\ defined by equation (4.1). 

(2) v\(ki) > 1/2 

(3) For m E N*, define n m := 2 4rm+1 . Then 

lim inf v Tlm (k 1 ) >2/3. 


Using the same construction as in the previous step, adding one more state U 2 to T, we obtain a 
hidden stochastic game T 2 which satisfies the equivalent of Proposition 4.3: 

Proposition 4.4. 

(%) The game r|(w 2 ) has the same value as the one-shot game G 2 , with action set rN for Player 
1, r(2N + 1) for Player 2, and payoff function g\(a , b) := A/2 + (1 — A )g\(a, b ). 
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(ii) The sequence of values (u 2 ) ofT 2 satisfies 

liminf > 1 / 2 . 

n —>-+oo 


Step 3 The game T 3 . 

Denote K\ (resp. K 2 ) the state space of T 1 (resp. T 2 ), and define the hidden stochastic game 
with state space K := K\ U K 2 U {^ 3 }, action sets / := J := {C, Q}, signal set A := {D,D'}. The 
game is described by the following figure: 


Figure 2: The game T 3 



The payoff is 1/2 in state W 3 , and Player 2 controls this state. If Player 2 plays C in W 3 , in 
the next stage the game r 1 (wi) is played, and if he plays Q, in the next stage the game r 2 (u; 2 ) is 
played. 

Proposition 4.5. 

(i) The game r|(u; 3 ) has the same value as the one-shot game G\, with action set rN for Player 
1 and Player 2, and payoff function g\ := A(2 — A)/2 + (1 — A) 2 ^, where g\ is described by 
equation (4-1). In particular, (ua(ws)) converges to 1/2. 

(ii) The sequence of values (u 3 ) o/T 3 satisfies 

liminf^(^ 3 ) > 1 / 2 . 
n —>•+00 


Proof. 

(i) The first point follows from Proposition 4.3 (i) and Proposition 4.4 (i), and the second point 
is a consequence of Property ((4)) in Step 1. 

(ii) This is a consequence of Proposition 4.3 (ii) and Proposition 4.4 (ii). 


□ 


Step 4 Final example and proof of the theorem. 

Let x such that 1/2 < x < liminfn^+oo -u 3 ^). Define a hidden stochastic game T 4 by adding 
two more state W 4 and x* to T 3 , as described in the following figure: 
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r 3 M 


Figure 3: The game T 4 



In state u> 4 , Player 2 has two options: play C and play the game r 3 (w 3 ) from the next stage, 
or play Q and get payoff x forever. Because lim^o ^(a^) = 1/2 < x, for A small enough, playing 
action C at stage 1 in r 4 (u; 4 ) is optimal for Player 2. Thus 

lim v\{(jJ4) = 1/2. 

A—>o 

Because lim inf n _>. +00 u 3 /^) > x, for n big enough, playing action Q at stage 1 in T 4 (a; 4 ) is optimal 
for Player 2. Thus 

lim u 4 (w 4 ) = x, 

n—>•+oo 

and the theorem is proved. 
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